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ABSTRACT 

In the age of wireless communication, the term churn is arising due to facility race in mobile phone companies. 
Churn means the movement of the customer from the existing company for better services which are the migration of 
customer from one service provider to another. At present the Telecommunication Company or market, the struggle is on 
their extreme and the products and offerings are more and more analogous. This activity gives a direct loss to the 
company. In that context, necessary action and step can be taken if the reason behind it or churner may be predicted 
before leaving the services. So there is a need to understand and simplify the model to deal with churn problem. This paper 
gives two-step churn prediction model which tries to design a simple methodology to overcome such problem via data 
mining tools and process. 
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INTRODUCTION 

The churn prediction in the telecommunication companies is a typical task which cannot be sent percent 
predictable. By means of clutching and remain of possibly churning customers' has arisen to be as essential for service 
supplier as the attainment of new customers. Far above the ground churn rates and substantial revenue loss due to churning 
have turned correct churn prediction and prevention to a vital business process. Even though churn is inescapable, but it 
may be managed and kept at an acceptable level. 

In general, there are many diverse conducts of churn prediction and novel techniques continue to emerge with the 
conventional statistical methods. High-quality prediction models have to be continually urbanized for the betterment. 
Valuable customers have to be identified, thus leading to a combination of churn prediction methods with customer 
lifetime value techniques. Here in the paper, a two-step method is proposed to contribute in the direction of solving the 
churn prediction problem. 

REVIEW OF LITERATURE 

We can be wrapping up on the basis of various models which elaborate on the importance of the work and suggest 
model as the extensions. In all Predictive model customer churn has been identified which is a major problem in the 
Telecom industry and hard-line research has been conducted with the support of the various data mining techniques. The 
core techniques of data mining Decision tree & its extensions, Neural Network based techniques and regression techniques 
are usually functional in customer churn. As of review and comparisons of the model and literature, it is observed that 
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decision tree based techniques, particularly C5.0 and CART, have performed some of the existing data mining techniques 
such as regression in terms of accuracy. Despite this neural networks outdo the previous techniques due to the size of 
datasets used and diverse feature selection methods applied. According to this comparisons, data mining methods and their 
applications based predictive model for customer churn prediction will be the final outcome. So the proposed predictive 
model will be based on CART algorithms mainly. 

Proposed Hypothetical Concept 

In the proposed model there are two steps are proposed to generate telecom churn model including data pre-processing 
step: 

• Defining Churn Algorithm and 

• Constructing a Predictive Model. 

To construct more accuracy churn model, we divide the huge data set into training data set and testing data set in 
data pre-processing step for constructing and refining churn models. 

First, the data scoping includes problem and data understanding are need to define by experts. For example, 
customer churn problem including contract-end or number-portable customers may happen in some particular business, 
product, or customer segmentation. Meanwhile, the corresponding data need to meet each specific request via feature 
analysis methodology. Through a serious of discussion and analysis, experts may decide that historical billing, contract 
status, or call detailed data will be useful to construct a model. 

Second is to set a time window for pre-processing raw data in different churn management problem. We 
accumulate raw data for 15 days to help training and testing, churn models. We utilize the training data to define time 
windows and measures and construct the churn predictive models. The second 15 days data set are then used to predict and 
verify the effectiveness of those models using the effectiveness measure and the result will be used to refine the churn 
models. 

Constructing a churn prediction is the base for the study. Hence, we can use several suitable data mining 
methodologies and algorithm for measuring the performance, such as decision tree, SVM, Neural network, regression, 
clustering and so on, to construct churn models according to the appropriate data. 

The Probable Model 

The model developed in this research is based on using K-means clustering in two stages. At the first stage, data 
reduction is performed by applying K-means algorithm on the selected training dataset. This process will split the training 
dataset into a number of smaller sets (clusters). Clusters with churners and non-churners are classified in the first stage. At 
the second stage. Decision Tree Algorithms and various performance measure data mining algorithms are applied to the 
selected clusters in order to assess the performance and develop -predictive conclusion for the churn customers. 
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Figure 1: Model for Churn Prediction (Step-1) 

Implementation Algorithm for Step-1 

Read C={x b x 2 , x 3 ... x n } 

// C is cluster of customer at one location of the whole data warehouse 

// x is customer with churn or non churn 

Where C=C nc + C ch 

nc= Non churner 

ch= Churner 

Read C nc ={ x h x 2 , x 3 ,.x k } 

// set of the customer with non churning 

Cch - { ^k+1, X k +2,>. X n } 

// customer with churning 

Here each record (x t ) of the cluster has set of 12 variables. Therefore one customer record can be show as: 

Xj={ V!, V 2 , v 3 ,. V[ 2 } 

where 

Vi = Call Ratio 
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v 2 = Average Call Distance 
v 3 = Last Call Date 
v 4 = First Call of Customer 
V 5 = Life Span 

v 6 = Time Distance between two calls ( Call Frequency) 

v 7 = Number of Days for Specific Call 

v g = Total Incoming Call 

v 9 = Total Out going Call 

Vio= Total Cost 

V] 1 = Incoming Call Duration 

Vi 2 = Out going Call Duration 

Read (x;) 

If (CN= “POC” OR “COC” ) 

Switch (CF) 

{ 

Case 1: 

Call-Frequency >= lday 
CF= “NC” 

Break; 

Case 2: 

Call-Frequency >= 2day 
CF= ‘‘NC” 

Break; 

Case 3: 

Call-Frequency >= 3days 
CF= ‘POC” 

Break; 

Case 4: 
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Call-Frequency >= 7days 
CF= “POC” 

Break; 

Case 5: 

Call-Frequency >= 15days 
CF= “COC” 

Break; 

Default: 

CF= “POC” 

Xj=x k+i ; i= {1, 2,.n} 

} 

else 

Xj= x i+ i; 

i=i+l; 



Figure 2: Model for Churn Prediction (Step-2) 

Implementation Algorithm for Step-2 

The value of Nonchurn customer and Churner customer are classified by considering subperiods of 15 days / day 
>7 in the two regular sets of observation. 
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Read C x — {Ci, C 2 , C 3 ... C n } 

// Ci = is cluster, where C T is set of all the clusters 
Read C={X[, x 2 , x 3 ... x n } 

Read values from x ; where ( Xj E C ) 


InMIN : Incoming Minute 
InFRQ : Incoming Frequency 
OtMIN: Outgoing Minute 
OtFRQ : Outgoing Frequency 


V InMIN = 


InMIN (2) - InMIN(l) 
InMIN (1) 


VInFRQ = 


InFRQ (2) - InFRQ( 1) 
InFRQ (1) 


VOtMlN = 


OtMIN (2) - OtMIN(l) 
OtMIN(l) 


V OtFRQ = 


OtFRQ (2)- OtFfi^(l) 
OtFRQ ( 1) 


...2 

.,,3 

... A 


If (VQtFRQ < QtFRQ(l) \ \ V InFRQ < InFRQ( 1)) 


&&(VOtMIN < OtMIN(i) 11 (VInMIN < InMIN { 1)) 


Churn = -‘COC” 


Else 


Churn = “NC” 


By using above-mentioned features, as the basic input data for the decision tree are suitablefor the predictive 
model which tries to find out the “churn”. 

Now different predictive methods/algorithm/ techniques are used for the different clusters. The training dataset is 
used and applying Decision Tree (CART Algorithm). 

Read C x ={Ci, C 2 , C 3 ... C n } 

DT = DT(Ci) 
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//Q stands for various clusters 
// DT is Decision Tree 

Applying different decision tree algorithm on various clusters for the predictive purpose 

• CART algorithm 

• C5.0 algorithm 

• CHAID Algorithm 

• Cost-Sensitive learning method 

• Neural Network Technique (Performance) 

DISCUSSIONS 

Stage one of the algorithms is based on findings of symptoms of the churner customer or the possibility of such a 
case. The prime task which is accomplished here is to clustering of the customer according to various locations. The 
robustness of the algorithm is to filtering customer through the data of various variables which are almost 10-12 for each 
one. At last of the stage customer will be categorized in the /concerned category from there we can predict the future 
possibilities with him/her. 

The second stage of the model gives a vast scope to apply traditional data mining on the filtered data 
simultaneously. This simultaneous operation gives us the facility to compare the result of various algorithms. This situation 
provides us to analyze the result with different-different angles. Also, we can choose the best result for taking in the action. 

CONCLUSIONS 

This paper gives us a hypothetical view and algorithm which provide a roadmap to solve the churn prediction 
model. The model gives a combination of various types of methods which are earlier used for solving such problems 
separately. This model is contributed combine approach to find out the optimum solution. The two- step strategy of the 
model gives an ample amount of opportunity to solve the problem from all the direction technically as well as statistically. 
So we can say that the proposed two step hypothetical model may provide a solution as per the nature of requirements of 
the researchers. As the base various data mining rules and algorithms are used as the part of the model, therefore, there is 
less chance to generate the bogus result. 
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